ADA-COVID: Adversarial Deep Domain Adaptation-Based Diagnosis of COVID-19 from Lung CT Scans Using Triplet Embeddings

Rapid diagnosis of COVID-19 with high reliability is essential in the early stages. To this end, recent research often uses medical imaging combined with machine vision methods to diagnose COVID-19. However, the scarcity of medical images and the inherent differences in existing datasets that arise from different medical imaging tools, methods, and specialists may affect the generalization of machine learning-based methods. Also, most of these methods are trained and tested on the same dataset, reducing the generalizability and causing low reliability of the obtained model in real-world applications. This paper introduces an adversarial deep domain adaptation-based approach for diagnosing COVID-19 from lung CT scan images, termed ADA-COVID. Domain adaptation-based training process receives multiple datasets with different input domains to generate domain-invariant representations for medical images. Also, due to the excessive structural similarity of medical images compared to other image data in machine vision tasks, we use the triplet loss function to generate similar representations for samples of the same class (infected cases). The performance of ADA-COVID is evaluated and compared with other state-of-the-art COVID-19 diagnosis algorithms. The obtained results indicate that ADA-COVID achieves classification improvements of at least 3%, 20%, 20%, and 11% in accuracy, precision, recall, and F1 score, respectively, compared to the best results of competitors, even without directly training on the same data. The implementation source code of the ADA-COVID is publicly available at https://github.com/MehradAria/ADA-COVID.


Introduction
Nearly 268 million people worldwide officially have been infected with the COVID-19, and more than 5.2 million death tolls until November 2021 [1] as of epidemic declaration in March 2020 signify the rapid diagnosis of the COVID-19 with high reliability in the early stages, not only to save human lives but also to reduce the social and economic burden on the communities involved. Although the RT-PCR (real-time polymerase chain reaction) test is the standard reference for confirming COVID-19, some studies show that this laborious method cannot diagnose the disease in the early stages [2][3][4][5], and some studies report a high false-negative rate [6].
One standard way to identify morphological patterns of lung lesions associated with COVID-19 is to use chest scan images. ere are two common techniques for scanning the chest: X-rays and computer tomography (CT). Detection of COVID-19 from chest images by a radiologist is timeconsuming, and the accuracy of COVID-19 diagnosis depends strongly on the radiologist's opinion [7,8]. Also, manually checking every image might not be feasible in emergency cases. Recently, deep learning-based methods [9,10] have been applied to help the medical community diagnose COVID-19 quickly, accurately, and automatically.
Using CT images to diagnose COVID-19 has recently drawn researchers' interest due to some key ideas that they possess: more accurate images of bones, organs, blood vessels, and soft tissues. Using these images, radiologists can better identify internal structures in more detail and evaluate their texture, density, size, and shape. Chest images obtained by CT scan usually provide much more accurate images of the patient's condition than X-rays. erefore, in recent works based on deep learning, CT scan images are used more than plain radiographs [11][12][13][14].
Deep learning-based methods usually require large datasets to achieve better results and overcome overfitting [15]. In the accurate detection of COVID-19 using chest images, the lack of comprehensive, high-quality datasets is a fundamental problem in this research area. e COVID CT dataset was first introduced in [16] and has been used in recent works [11][12][13]. e next SARS-CoV-2 CTscan dataset [17] contains 2,482 CT scan images collected from hospitals in São Paulo, Brazil. Another large dataset includes 7495 positive corona samples and 944 negative ones [18]. e mentioned datasets are the largest and most common datasets used in this field, while researchers also use other small public datasets [19][20][21].
It is observed that obtained evaluation results on test samples belonging to the same dataset used for training are significantly better than other datasets [22]. In other words, model performance is artificially good when the train and test sets belong to the same dataset. At the same time, model performance is dramatically reduced when the trained model is evaluated on another dataset. Numerous studies demonstrate that the most recent approaches in the literature are unreliable [23,24]. For example, two well-known studies [10,25] in this field show a performance close to random classification facing unseen data (i.e., datasets on which the model has not been trained). For example, the classification accuracy in research [22] decreases from 98.5% on the test set to 59.12% on unseen datasets. e structural and inherent differences in the images from the available datasets, which arise from different tools and medical imaging methods, are the cause of this issue. Upon closer inspection, we found that most of the proposed methods for detecting and classifying COVID-19 are trained and tested on a set of images from the same dataset. Using a single dataset during network training reduces the model generalization. One of the fundamental problems of deep learning is shortcut learning [26]. Decision rules that perform well on typical benchmarks but fail to transfer to more complex testing situations, such as real-world scenarios, are examples of shortcuts [26].
is paper proposes an adversarial deep domain adaptation-based approach for diagnosing COVID-19 from lung CT scan images, termed ADA-COVID. In ADA-COVID, two datasets with different input domains are used in the network training process. e goal is to generate similar representations for images belonging to two different domains.
is model can perform the correct classification regardless of the specific features of each input distribution. In other words, the generated representations are domain invariant, and overall better representations are generated. Also, due to the excessive structural similarity of medical images compared to other image data in machine vision, we use the triplet loss function for the training model. Using this loss function, similar (dissimilar) representations are generated for samples of the same class (different classes) in the embedded space. e contributions of this work are twofold: (1) e effect of structural and intrinsic differences in images obtained from different medical imaging tools and methods is minimized as a result of the introduced domain adaptation-based approach for CT images. (2) A custom deep model is designed based on this approach to make corona case detection more reliable. erefore, the generalization of the ADA-COVID for other new datasets and in the real-world application is high. e performance of the ADA-COVID method is evaluated on two standard datasets, and extensive experiments are performed to examine the effectiveness of each solution proposed.
e results show that our approach achieves higher performance than the existing competitors. e rest of the paper is organized as follows. Section 2 gives a brief review of the related work; in Section 3, the proposed ADA-COVID is described in detail. In Section 4, the experimental results are presented. In Section 5, the conclusions and possible future works are discussed.

Methods Based on Customized Network Architectures.
COVID-Net [10] is one of the earliest methods based on convolutional neural networks designed to detect COVID-19 using X-ray images. CVR-Net (Coronavirus Recognition Network) [31] is a customized model with convolutional layers trained and tested on a combination of CT and X-ray images. In CVR-Net, an average accuracy of 78% was reported for the CT image dataset. Further improvements were added to COVID-Net to improve its representational ability for one specific image modality and to make the network computationally more efficient [35]. CovidCTNet [36] is a set of open-source algorithms used to differentiate COVID-19 from community-acquired pneumonia (CAP) and other lung diseases. e aim of designing this model is to work with heterogeneous and small sample sizes independent of CT imaging hardware. In [13], an AUTOENCODER-based architecture was used to simultaneously segment and classify CT images. eir proposed architecture consists of an encoder and three decoders; these decoders are used for image reconstruction, image segmentation, and classification, respectively. COVID CT-Net [30] is an attentional CNN, which can focus on the infected areas of the chest. All of the introduced approaches in this category propose a customized architecture for detecting infected cases without utilizing any well-established pretrained networks.
In [49], deep learning models and chest CT images differentiate coronavirus pneumonia from influenza pneumonia. is study was performed on CT images collected from various hospitals in China. eir studies have proven the effectiveness of CT images in diagnosing COVID-19. DeepPneumonia [50] was designed to classify COVID-19, bacterial pneumonia, and healthy cases based on CT images. eir proposed model achieved an accuracy of 86 : 5% for differentiating bacterial pneumonia and COVID-19 and 94% for distinguishing COVID-19 and healthy cases.
In [51], a new method called CONVNet based on the 3D deep learning framework for COVID-19 identification has been developed. is method can extract three-dimensional and two-dimensional representations.
is method used ResNet [44] architecture. In [52], a deep transfer learning algorithm was introduced that used X-ray and CT scan images to accelerate the detection of COVID-19 cases. In [53], an attention-based deep learning model using the attention module with VGG-16 has been proposed. is method captures the spatial relationship between the ROIs in chest X-ray images. In [54], a new method based on BoVW (Bag of Visual Words) features has been proposed, which by removing the feature map normalization step and adding the deep features normalization step on the raw feature maps helps preserve the semantics of each feature map that may have important clues to differentiate COVID-19 from pneumonia.

Methods Based on Handcrafted Feature Extraction.
Some COVID-19 detection methods used handcrafted feature extraction approaches. In [55], first, different texture features are extracted from the images by popular texture descriptors, and then, these texture features are combined with the extracted features from the pretrained Inception-v3 [56] model. In [57], a method for classifying the positive and negative cases of COVID-19 based on CT scan images was proposed. Different texture features were extracted from CT images using the Gabor filter, and then, the SVM method was used to classify these images. In [58], to reduce intensity variations between CT slices, a preprocessing step was applied on CT slices. en, a long short-term memory (LSTM) classifier was used to discriminate between COVID-19, pneumonia, and healthy cases.
Other related methods based on the combination of feature extraction approaches and deep learning models are introduced in [59].
Most of the mentioned methods are highly dependent on the image domain of datasets on which they were trained. If the test set is from the same domain of the training set, the model performance will be acceptable. However, when the domain of the evaluation dataset is different, model performance is significantly reduced. However, in real-world applications, the domain of the inference image is not always the same as the training set. In other words, unseen data are often independent of the training set, so the results would not be reliable.

Proposed Approach: ADA-COVID
To overcome the mentioned problems in Section 1, we use the domain adaptation technique during model training. Using this technique during the training process, the generated representations do not depend on the domain of a particular dataset. Also, we use the triplet loss function for the training phase. Using this loss function, the distance between pairs of samples with similar classes in the embedded space is less than samples with different classes. erefore, the extracted representations from the ADA-COVID model are very discriminative and domain invariant. Figure 1 shows a general overview of the proposed method. As shown in this figure, different input domains are used in the COVID-19 detection process. e aim is to bring the statistical distribution of these domains closer together in the embedded space. e proposed method uses two different input domains named source and target. e source dataset is used to train the network, and the target dataset is applied for better generalizability of the network on new datasets. e next step is preprocessing, which includes decoding and resize, normalization, and augmentation. e output images from the preprocessing stage are entered into the ADA-COVID architecture. ADA-COVID consists of three modules as follows-(1) domaininvariant feature extractor: this module is responsible for extracting features, (2) classifier: this module is responsible Computational Intelligence and Neuroscience for classifying data into two groups: COVID-19 and non-COVID-19, and (3) discriminator: this module is responsible for distinguishing and differentiating source data from target data. e main aim of the proposed method is to generate similar representations for images belonging to two different domains. is model can perform the correct classification regardless of the specific features of each input distribution. erefore, the model's generalizability increases, and it has the best performance for images belonging to different input domains. e preprocessing step is introduced in Section 3.2, and the proposed ADA-COVID framework is described in Section 3.3.

Preprocessing.
Preprocessing stage is performed to prepare data for training and evaluation of the model. e following paragraphs describe the different steps carried out in this regard.

Decode and Resize.
CT scan images are saved in DICOM file format, the widely used format in medical imaging. We need to convert DICOM format images to standard image formats such as JPG or PNG to work with such images. In the proposed method, we convert the images to grayscale PNG format.
In deep neural networks, input images are often resized to maintain compatibility with the network architecture and reduce computations [60]. e proposed method uses the pretrained ResNet50 architecture as a feature extractor with an input size of 224 × 224. erefore, all images are resized into 224 × 224 pixels for training, validation, and testing.

Normalization.
To reduce the effect of intensity variations between CT slices, we normalize the data through (1) in the range [0, 1].
where x i represents the i-th image in the train set, and μ and σ represent the pixel level mean and standard deviation for all images in the train set, respectively. ε � 1e − 10 is an insignificant value to prevent zero division, i is the index of each training sample, and Z i is the normalized version of X i .

Data Augmentation.
Data augmentation means increasing the number of training samples by transforming images without losing semantic information. We use five transformations that are randomly applied to samples of the training set. ese transformations are selected so that they do not lead to different interpretations by radiologists. e details of these transformations are summarized in Table 1. Figure 2 shows some images after the preprocessing is applied.

ADA-COVID Framework.
As shown in Figure 1, the ADA-COVID framework consists of three main modules: domain-invariant feature extractor, classifier, and discriminator. e following paragraphs describe these modules in detail. Also, the training procedure is provided in this section.

Domain-Invariant Feature Extractor.
is module is applied for extracting image features.  features from images. ese models require many training data to generate better representations. However, in the COVID-19 detection task, the size of the available datasets for network training is very small. Using transfer learning techniques is a practical solution to overcome this limitation. Transfer learning is a well-known representation learning technique in which the models trained on large image datasets (such as ImageNet [61]) are used to initialize the models for tasks for which the dataset is small. ere are two general approaches to use transfer learning from pretrained models: feature extraction and fine-tuning [62]. In the first approach, only the weights of some newly added layers (classification layers) are optimized during training, while in the second approach, all the weights (or part of the weights) are optimized for the new task. In the proposed framework, we use the fine-tuning approach.
In the proposed framework, we use the pretrained ResNet50 [44] convolutional model trained on the ImageNet dataset.
e ResNet50 has been selected after testing the most common pretrained CNN models. is architecture is smaller than other ResNet-based models (such as ResNet101 [44] and ResNet152 [44]) and has fewer parameters. erefore, network training time is less than other models. An overview of the ResNet50 model is shown in Figure 3.

Discriminator and Classifier.
As described in Section 3.1, different input domains (source and target datasets) are used in the COVID-19 detection process. e discriminator module is responsible for differentiating source data from target data, and the classifier module is responsible for classifying data into two groups: COVID-19 and non-COVID-19. Figure 4 illustrates an overview graphical representation of the model using the adversarial training approach in a multisource transfer learning setting. e classifier and discriminator utilize the features extracted by the feature extractor module at the same time to predict the class label and the domain from which the data came. e output predictor (classifier) and the domain classifier (discriminator) are trained classically by backpropagating their respective losses. When it reaches the feature extractor module, the gradient reversal layer reverses (multiplies by) (1) the domain classifier's loss. As a result, while the feature extractor learns a feature representation that is beneficial for output prediction, it also learns a feature representation that is indiscriminate of the domain from which the data come, promoting a more generalized one. e goal of learning representations using this joint architecture is to (1) generate representations that are indistinguishable from each other; (2) increase the model's generalizability; (3) learn representations that are based on essential features that are independent of the specific domain and dataset; and (4) distinguish COVID cases from non-COVID-19 cases with high accuracy. e architecture of the discriminator and classifier modules is almost the same. In the discriminator module, we pass the extracted features into two consecutive blocks consisting of dense, batch normalization, ReLU, and dropout layers. On top of the discriminator module, we use the sigmoid function. e output of this function indicates the probability of assigning each sample to the source or target class. In the classifier module, in addition to two consecutive blocks consisting of dense, batch normalization, ReLU, and dropout layers, an embedding layer is added on top of the classifier module. is embedding layer is dense and has 64 neurons.
After training the network and learning the appropriate embedding, in the test phase, a dense layer with two neurons and a softmax activation function is added on top of the classifier module so that the network can be used independently as a classifier. Also, the discriminator module is no longer needed in the test phase, so this module is removed in the final application.

Loss Function.
Inspired by [63], we use two losses simultaneously in the network training process to increase the generalizability and transferability of the model. (2) represents the loss of the proposed method. is loss is a combination of a classifier loss (L y ) and a discriminator loss (L d ). λ d and λ y are the coefficients given to the discriminator and classifier losses, respectively. ese coefficients are used to find the optimal balance between variance and bias for better generalizability of the model.
We use the triplet loss function to calculate the classifier output loss (L y ) and the crossentropy loss function to calculate the discriminator domain loss (L d ).
Triplet loss was first introduced in FaceNet [64]. e idea is that pairs of samples in a class should have similar representations, and pairs of samples in different classes should have different representations in the embedded space. In triplet loss, a positive sample and a negative sample are selected for each sample (anchor). e positive sample is selected from the same class of the anchor sample, and the negative sample is selected from the opposite class of the anchor sample. Positive and negative samples are selected in each batch, and the loss function is calculated by (3).
where, in (3), the function f θ represents the data in embedded space and θ are parameters that must be learned. us, f θ (A), f θ (P), and f θ (N) represent embedded representations for the anchor, positive, and negative samples, respectively. ‖‖ 2 represents the Euclidean distance, and α is a margin used to ensure that the model does not make the   Due to the many inherent and structural similarities in medical images, the use of this loss function can be helpful to better differentiate data from two different classes in our task. e brilliant results of using this function in the present application reinforce the validity of this hypothesis.
We use the crossentropy loss function to calculate the discriminator domain loss (L d ).
e crossentropy loss function is calculated by (4). In this equation, y represents the actual class, and y represents the model output prediction.

Experiments
In this section, the performance of the ADA-COVID model is evaluated. e results are compared with the following groups of methods: (    [80], and DenseNet-121 + SVM [4] Parameter batch size and the maximum number of iterations are typical in the implemented methods, which are set to 32 and 2 × 10 4 , respectively. e learning rate in the ADA-COVID framework is set to 10 − 2 (in the ADA-COVID framework, Adam optimizer is used). In the proposed algorithm, the required parameters are set as follows: λ y � 4, and λ d � 1. Due to the small number of images, overfitting may occur. To solve this problem, dropout has been used along with the data augmentation technique. e dropout rate is set to 0.5. Also, the k-fold crossvalidation technique, considering k � 5, is used in the ADA-COVID framework. e experiments are conducted using the Keras framework on a computer with Intel (R) Core (TM) i7-7700K, 16 GB RAM, Nvidia GTX 1080 GPU.
To maximize the reliability of the proposed model, several slides of a patient's CT scan images are given to the network, and the average results are reported. In contrast, most of the compared methods reported the best results among different slides of a patient's CT scan images.

Dataset.
To train and evaluate the model's performance and compare it with other methods, we use the SARS-CoV-2 CT scan dataset [17] as the source dataset and the COVID19-CT dataset [11] as the target dataset. e details of datasets are summarized in Table 2. 80% of the dataset is selected as a training set, and 20% of the dataset is selected as a test set.

Performance Metrics.
e following five metrics are used to measure the performance in the experiments.
Accuracy is the number of correct predictions divided by the total number of samples [81]. is metric is calculated as follows.
Precision is the ratio of correct positive predictions to the number of positive results predicted. is metric is calculated as follows.
Recall is the ratio of the number of correct positive predictions to the number of all relevant samples. is metric is calculated follows.
F 1 score is the harmonic mean between precision and recall [82,83]. is metric is calculated as follows.
Specificity rate corresponds to the proportion of negative samples that are correctly considered negative with respect to all negative samples. is metric is calculated by (9).
In our experiments, these metrics are expressed as a percentage. A high percentage indicates a better performance.

Experiment 1: Evaluation on the Source Dataset.
is section compares the proposed approach's performance with the state-of-the-art methods on the source dataset, shown in Table 3. e results of other methods have been quoted directly from the relevant publications. Also, for the proposed method, the confusion matrix of evaluation on the test set of the source dataset is illustrated in Figure 5. From Table 3 and Figure 5, it is evident that ADA-COVID performs better than the other methods. It performs the best overall performance for all evaluation metrics. e average accuracy, precision, recall, and F1 metrics of ADA-COVID are 99.9%, 99.9%, 99.8%, and 99.9%, respectively. Recall 99.8% indicates that, on average, only one COVID-19 image is incorrectly predicted as non-COVID-19. Also, the proposed model correctly diagnoses all non-COVID-19 cases with only one false positive. After ADA-COVID, the Effi-cientNetB0, xDNN, DenseNet201-based, and ShuffleNet methods have relatively good performance, respectively. In EfficientNetB0 architecture, on average, two COVID-19 images are incorrectly predicted as non-COVID-19. e visual evaluation results of ADA-COVID on 25 random samples from the test dataset are shown in Figure 6. Due to the model's high precision, there was no unsuccessful sample prediction in random results to examine the model's possible weaknesses.

Experiment 2: Evaluation on the Target Dataset.
In this section, we evaluate the performance of the proposed approach on the target dataset once without training and once with training. In the first case, the model is trained on the source dataset and evaluated on the target dataset. In the second case, the proposed model is trained and evaluated independently on the target dataset. In other words, the second dataset input part and discriminator module are not considered in the second case, and the network is trained and evaluated on the target dataset. e performance of the different methods and models is shown in Table 4. e results of other methods have been quoted directly from the relevant publications. e compared methods are trained and evaluated on the target dataset. 8 Computational Intelligence and Neuroscience As shown in Table 4, ADA-COVID with training on the target dataset has the best performance. e average accuracy, recall, specificity, and F1 metrics are 95.8%, 94.9%, 96%, and 95.2%, respectively. Also, after ADA-COVID with training, the ADA-COVID without training mode has the best performance. e average accuracy, recall, specificity, and F1 metrics are 92.5%, 93.5%, 94.2%, and 93%, respectively. In without training mode, the average recall of 93.5% indicates that, on average, eight images of COVID-19 are incorrectly predicted as non-COVID-19. Also, the average specificity of 94.2 indicates that all non-COVID-19 cases are detected with only seven false-positive samples. In training mode, the average recall of 94.9% indicates that, on average, 6 COVID-19 images are incorrectly predicted as non-COVID-19. Also, the average of specificity 96% indicates that it detects all cases of non-COVID-19 with only five false-positive samples.
In ADA-COVID without training mode, although the proposed approach is not trained on the target dataset, it has better results than other compared methods. is indicates that the proposed method has significantly increased generalizability, independent of the source dataset.
After ADA-COVID with and without training modes, the ResNet-50, ResNeXt-101, and ResNeXt-50 architectures have relatively good performance. In these architectures, the average recall of 92.16% indicates that, on average, more than 12 COVID-19 images are incorrectly predicted as non-COVID-19. Also, the average specificity of 90.2% indicates that all non-COVID-19 cases are detected with 15 falsepositive samples. Among the reported results, AlexNet has the worst performance.

Experiment 3: Crossdataset Evaluation.
To investigate the effect of the domain adaptation used in our approach on the final results, we evaluate the proposed approach's performance once with domain adaptation and once without domain adaptation. Also, we train the model once on the source dataset and evaluate it on the target dataset, and once on the target dataset and evaluate it on the source dataset. We compare the proposed method with the method   Computational Intelligence and Neuroscience proposed by Silva et al. [22] as the baseline. As shown in Table 5, the proposed method performs better than the reverse case when trained on the source dataset and evaluated on the target dataset. e reason for this is that the size of the target dataset is much smaller than the size of the source dataset. Also, the data collected in this dataset are from different sources, in different contrasts and with different visual features. erefore, it is not a suitable dataset for the model training. Comparison of the proposed method with baseline [22] shows that the proposed method has improved the results by an average of 30%. As shown in Table 5, the proposed approach performs better with the domain adaptation technique than without domain adaptation mode. Using the domain adaptation technique, the proposed approach results are improved by an average of 44.10%. erefore, it can be concluded that better representations are generated by using the domain adaptation technique. As a result, the quality of the diagnoses is better, specifically for the unseen new samples.

Experiment 4: ADA-COVID vs. Pretrained Models.
We evaluate the proposed method with methods in which the models are already pr-trained on the ImageNet dataset. As shown in Table 6, the results show that the proposed algorithm has a higher performance than other successful methods in this field. e critical point is that the proposed method is trained on the small SARS-CoV-2 CTscan dataset, while the other methods are often trained on a large dataset. erefore, apart from the qualitative contributions and the proposed innovations that offer a low-cost and practical solution to overcome the shortcut learning problem [26], the proposed method achieves significant improvements using only a few sets of training samples without suffering from overfitting problem. e method presented by Ardakani et al. [84] has almost higher performance than ADA-COVID in terms of recall metric; however, it suffers from low reliability. In other words, network performance decreases dramatically in the face of unseen data. Figure 6: Performance evaluation on 25 random samples from the test set. "I" is the image index, "P" is the predicted value, and "L" is the ground truth label.

Experiment 5: ADA-COVID vs. Customized Models.
is section compares the proposed method with methods that have developed customized architectures specifically to detect COVID-19. In these methods, transfer learning is not used, and the network is trained from scratch. Table 7 shows the results for ADA-COVID and other compared approaches. As shown in this table, the ADA-COVID method, for all metrics, has the best results. After ADA-COVID, Elghamrawy and Hassanien [80] have the second-best results. Moreover, Hasan et al. [58] have    relatively good performance. Wang et al. [77] have the worst performance among the reported results.

Conclusion
Rapid diagnosis of COVID-19 with high reliability is vital in the early stages of the infection. Using the transfer learning technique, this paper proposed an adversarial deep domain adaptation-based approach (ADA-COVID) for COVID-19 diagnosis from lung CT scan images. Previous studies suffer from shortcut learning when the model is trained using limited train data; furthermore, the state-of-the-art approaches fail to generalize for new samples, achieving poor performance or behaving similar to random predictors.
anks to the proposed domain adaptation between the source and unseen target samples, ADA-COVID guarantees that the generated representations do not depend on the domain of a specific dataset. In addition, since medical images have a high structural resemblance compared to other image data in machine vision tasks, we utilized the triplet loss function for training the proposed model to achieve improved discrimination between positive and negative samples. Finally, the proposed approach can be easily extended for similar applications which utilize medical imaging such as radiography. ADA-COVID's performance was tested and compared to many state-of-the-art approaches.
e results demonstrated that ADA-COVID achieves significant classification improvements, up to 60%, compared to the best results of competitors, even without directly training on the same dataset.

Data Availability
Previously reported image data were used to support this study and are available at doi.org/10.1101/2020.04. 24. 20078584 and https://doi.org/10.1101/2020.04. 13.20063941. ese prior studies (and datasets) are cited at relevant places within the text as references [11,17].